Image source zmescience
The SARS-CoV-2 virus causes Coronavirus Disease (COVID-19), an infectious disease. The majority of patients infected with COVID-19 will have mild to moderate symptoms and will recover without any additional therapy. Some, on the other hand, will become critically unwell and require medical assistance.
The virus can spread from an infected person’s mouth or nose in small liquid particles when they cough, sneeze, speak, sing or breathe. These particles range from larger respiratory droplets to smaller aerosols. You can be infected by breathing in the virus if you are near someone who has COVID-19, or by touching a contaminated surface and then your eyes, nose or mouth. The virus spreads more easily indoors and in crowded settings.
This is a comprehensive analysis report of the Novel Coronavirus (COVID-19) around the world, to demonstrate data processing and visualization, insights and prediction.
Here we are basically given with three main dataset.
As a first step let’s look at each one of them. Here as we can see, for the first table, we have the country name, latitude, longitude information, and then the number of cases confirmed as the time progress. Similarly we can can see the second and thrid dataset we can see the death rate and recovery rate as the time progress.
raw.data.confirmed <- read.csv('time_series_covid19_confirmed_global.csv')
head(raw.data.confirmed, n=5L)
raw.data.deaths <- read.csv('time_series_covid19_deaths_global.csv')
head(raw.data.deaths, n=5L)
raw.data.recovered <- read.csv('time_series_covid19_recovered_global.csv')
head(raw.data.recovered, n=5L)
Here we can observe some discrepancy in last dataframe. The first two dataframe consist of 284 observation of 785 variables. While, the third column consist of 269 observation of 785 variables. This means that recovery information of few instances is not yet available. Here we can also observe that Province.state column is empty for all observations, hence we can drop that column in further analysis.
str(raw.data.confirmed,list.len=10)
## 'data.frame': 284 obs. of 825 variables:
## $ Province.State: chr "" "" "" "" ...
## $ Country.Region: chr "Afghanistan" "Albania" "Algeria" "Andorra" ...
## $ Lat : num 33.9 41.2 28 42.5 -11.2 ...
## $ Long : num 67.71 20.17 1.66 1.52 17.87 ...
## $ X1.22.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X1.23.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X1.24.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X1.25.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X1.26.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X1.27.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## [list output truncated]
str(raw.data.deaths,list.len=10)
## 'data.frame': 284 obs. of 825 variables:
## $ Province.State: chr "" "" "" "" ...
## $ Country.Region: chr "Afghanistan" "Albania" "Algeria" "Andorra" ...
## $ Lat : num 33.9 41.2 28 42.5 -11.2 ...
## $ Long : num 67.71 20.17 1.66 1.52 17.87 ...
## $ X1.22.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X1.23.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X1.24.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X1.25.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X1.26.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X1.27.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## [list output truncated]
str(raw.data.recovered, list.len=10)
## 'data.frame': 269 obs. of 825 variables:
## $ Province.State: chr "" "" "" "" ...
## $ Country.Region: chr "Afghanistan" "Albania" "Algeria" "Andorra" ...
## $ Lat : num 33.9 41.2 28 42.5 -11.2 ...
## $ Long : num 67.71 20.17 1.66 1.52 17.87 ...
## $ X1.22.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X1.23.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X1.24.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X1.25.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X1.26.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X1.27.20 : int 0 0 0 0 0 0 0 0 0 0 ...
## [list output truncated]
Here as we can see, the columns are with dates and rows are with counts of its corresponding confirmed cases, detaths and recovery counts. For proper analysis of this data let’s reshape the data into a longer dataset.
raw.data.confirmed <- raw.data.confirmed %>% pivot_longer(cols = starts_with("X"), names_to = "Date", values_to = "confirmed_count")
raw.data.confirmed
raw.data.deaths <- raw.data.deaths %>% pivot_longer(cols = starts_with("X"), names_to = "Date", values_to = "death_count")
raw.data.deaths
raw.data.recovered <- raw.data.recovered %>% pivot_longer(cols = starts_with("X"), names_to = "Date", values_to = "recovered_count")
raw.data.recovered
Now here we can see the date and its corresponding counts (confirmed, dath and recovery), in that particular dates. Now we can observe that these dates.
raw.data.confirmed$Date <- substr(raw.data.confirmed$Date,2,20)
raw.data.confirmed
raw.data.deaths$Date <- substr(raw.data.deaths$Date,2,20)
raw.data.deaths
raw.data.recovered$Date <- substr(raw.data.recovered$Date,2,20)
raw.data.recovered
Now lets merge three dataframes for easy comparison of the data.
data = merge(x = raw.data.confirmed , y = raw.data.deaths, by = c("Province.State","Country.Region","Lat","Long","Date"))
data = merge(x = data , y = raw.data.recovered, by = c("Province.State","Country.Region","Lat","Long","Date"))
data <- data[order(as.Date(data$Date, format="%m.%d.%Y")),]
data
Here we merged, three data frames and ordered as per date.
data <- separate(data, Date, into=c("month", "day", "year"), sep="\\.", remove = FALSE)
data
From here onwards we are going to analyze data. Here our analysis is divided based on perpectives. By prepective, it meases that we are analysing data in different point of view which includes,
First let’s store our data into a world_data variable so that we could manupulate these without disturbing our base data.
world_data <- data
world_data_by_countries <- world_data %>% group_by(Country.Region) %>%
summarise(confirmed = max(confirmed_count),
death = max(death_count),
recovered = max(recovered_count),
Lat=max(Lat),
Long=max(Long))
world_data_by_countries
Here, you can see severe discrepancy within cases confirmed and the corresponding deaths and recovered data. This is happened because after certain time frame, the data regarding deaths and recovered is not made available within database.
Now, Let’s explore some quick stats about novel coronavirus 2019.
Here is the top 20 countries having highest number of confirmed cases.
top_20_countries_c <- top_n(world_data_by_countries, 20, confirmed)
confirmed_plot <- ggplot(top_20_countries_c, aes(x=Country.Region, y=confirmed)) + geom_bar(stat="identity", width=0.7, position = "dodge", aes(fill=confirmed)) + coord_flip() + scale_fill_continuous(type = "viridis") + scale_y_log10() + labs(x="\nCountry", y="Confirmed cases\n") + theme_bw() +
theme(axis.text.x=element_text(angle=45, vjust=0.5))
confirmed_plot
top_20_countries_r <- top_n(world_data_by_countries, 20, recovered)
recovered_plot <- ggplot(top_20_countries_r, aes(x=Country.Region, y=recovered)) + geom_bar(stat="identity", width=0.7, position = "dodge", aes(fill=recovered)) + coord_flip() + scale_fill_continuous(type = "viridis") + scale_y_log10() + labs(x="\nCountry", y="Recovered cases\n") + theme_bw() +
theme(axis.text.x=element_text(angle=45, vjust=0.5))
recovered_plot
Here is the top 20 countries having highest number of confirmed cases.
top_20_countries_d <- top_n(world_data_by_countries, 20, death)
death_plot <- ggplot(top_20_countries_d, aes(x=Country.Region, y=death)) + geom_bar(stat="identity", width=0.7, position = "dodge", aes(fill=death)) + coord_flip() + scale_fill_continuous(type = "viridis") + scale_y_log10() + labs(x="\nCountry", y="Death count\n") + theme_bw() +
theme(axis.text.x=element_text(angle=45, vjust=0.5))
death_plot
Now, let’s see the countries/Regions least affected by COVID-19 virus.
least_20_countries_c <- top_n(world_data_by_countries, -20, confirmed)
least_c <- ggplot(least_20_countries_c, aes(x=Country.Region, y=confirmed)) + geom_bar(stat="identity", width=0.7, position = "dodge", aes(fill=confirmed)) + coord_flip()
least_c
least_20_countries_r <- top_n(world_data_by_countries, -20, recovered)
least_r <- ggplot(least_20_countries_r, aes(x=Country.Region, y=recovered)) + geom_bar(stat="identity", width=0.7, position = "dodge", aes(fill=recovered)) + coord_flip()
least_r
least_20_countries_d <- top_n(world_data_by_countries, -20, death)
least_d <- ggplot(least_20_countries_d, aes(x=Country.Region, y=death)) + geom_bar(stat="identity", width=0.7, position = "dodge", aes(fill=death)) + coord_flip()
least_d
### Plot - Top Affected (Map)
top_20_countries_d
leaflet(options=leafletOptions(dragging=FALSE, minzoom=18, maxzoom=18, nowrap=TRUE)) %>% addProviderTiles("CartoDB", group="CartoBD") %>%
addCircleMarkers(data = top_20_countries_c, lng = ~Long, lat = ~Lat, label = ~Country.Region, radius= 0.2, group="Top 20 confirmed") %>%
addCircleMarkers(data = top_20_countries_r, lng = ~Long, lat = ~Lat, label = ~Country.Region, radius= 0.2, group="Top 20 recovered") %>%
addCircleMarkers(data = top_20_countries_d, lng = ~Long, lat = ~Lat, label = ~Country.Region, radius= 0.2, group="Top 20 death") %>%
addCircleMarkers(data = least_20_countries_c, lng = ~Long, lat = ~Lat, label = ~Country.Region, radius= 0.2, group="Least 20 confirmed") %>%
addCircleMarkers(data= least_20_countries_r, lng = ~Long, lat = ~Lat, label = ~Country.Region, radius= 0.2, group="Least 20 recovered") %>%
addCircleMarkers(data= least_20_countries_d, lng = ~Long, lat = ~Lat, label = ~Country.Region, radius= 0.2, group="Least 20 deaths") %>%
addLayersControl(baseGroups = c("Top 20 confirmed","Top 20 recovered","Top 20 death","Least 20 confirmed", "Least 20 recovered", "Least 20 deaths"), options = layersControlOptions(collapsed = FALSE))
Now, let’s analyze how all these begin and how the virus got progressed.
So here let’s first filter first 6 months covid 19 data.
world_data <- data
world_data <- transform(world_data, month = as.numeric(month),
year = as.numeric(year), day=as.numeric(day)) %>% mutate_at(c("confirmed_count"), ~(scale(.)*10 %>% as.vector))
first_months <- world_data %>% filter(year==20) %>% filter(month<=1) %>% group_by(Country.Region) %>%
summarise(confirmed = max(confirmed_count),
death = max(death_count),
recovered = max(recovered_count),
lat = mean(Lat),
long = mean(Long))
second_months <- world_data %>% filter(year==20) %>% filter(month<=2) %>% group_by(Country.Region) %>%
summarise(confirmed = max(confirmed_count),
death = max(death_count),
recovered = max(recovered_count),
lat = mean(Lat),
long = mean(Long))
third_months <- world_data %>% filter(year==20) %>% filter(month<=3) %>% group_by(Country.Region) %>%
summarise(confirmed = max(confirmed_count),
death = max(death_count),
recovered = max(recovered_count),
lat = mean(Lat),
long = mean(Long))
forth_months <- world_data %>% filter(year==20) %>% filter(month<=4) %>% group_by(Country.Region) %>%
summarise(confirmed = max(confirmed_count),
death = max(death_count),
recovered = max(recovered_count),
lat = mean(Lat),
long = mean(Long))
fifth_months <- world_data %>% filter(year==20) %>% filter(month<=5) %>% group_by(Country.Region) %>%
summarise(confirmed = max(confirmed_count),
death = max(death_count),
recovered = max(recovered_count),
lat = mean(Lat),
long = mean(Long))
sixth_months <- world_data %>% filter(year==20) %>% filter(month<=6) %>% group_by(Country.Region) %>%
summarise(confirmed = max(confirmed_count),
death = max(death_count),
recovered = max(recovered_count),
lat = mean(Lat),
long = mean(Long))
second_6_months <- world_data %>% filter(year==20) %>% filter(month<=12) %>% group_by(Country.Region) %>%
summarise(confirmed = max(confirmed_count),
death = max(death_count),
recovered = max(recovered_count),
lat = mean(Lat),
long = mean(Long))
world_data
So here let’s first filter first 6 months covid 19 data.
library(leaflet)
pal = colorNumeric(
palette = "viridis",
domain = world_data$confirmed
)
leaflet() %>%
addProviderTiles("CartoDB", group="CartoBD",options=providerTileOptions(nowrap=TRUE)) %>%
addCircleMarkers(data = first_months, lng = ~long, lat = ~lat, label = ~Country.Region, color=~pal(first_months$confirmed), radius= ~confirmed*2, group="First month") %>%
addCircleMarkers(data = second_months, lng = ~long, lat = ~lat, label = ~Country.Region, color=~pal(second_months$confirmed), radius= ~confirmed*2, group="Second month") %>%
addCircleMarkers(data = third_months, lng = ~long, lat = ~lat, label = ~Country.Region, color=~pal(third_months$confirmed), radius= ~confirmed*2, group="Third month") %>%
addCircleMarkers(data = forth_months, lng = ~long, lat = ~lat, label = ~Country.Region, color=~pal(forth_months$confirmed), radius= ~confirmed*2, group="Forth month") %>%
addCircleMarkers(data= fifth_months, lng = ~long, lat = ~lat, label = ~Country.Region, color=~pal(fifth_months$confirmed), radius= ~confirmed*2, group="Fifth month") %>%
addCircleMarkers(data= sixth_months, lng = ~long, lat = ~lat, label = ~Country.Region, color=~pal(sixth_months$confirmed), radius= ~confirmed*2, group="Sixth month") %>%
addCircleMarkers(data= second_6_months, lng = ~long, lat = ~lat, label = ~Country.Region, color=~pal(second_6_months$confirmed), radius= ~confirmed*2, group="last 6 months") %>%
addLayersControl(baseGroups = c("First month","Second month","Third month","Forth month", "Fifth month", "Sixth month","last 6 months"), options = layersControlOptions(collapsed = FALSE))
# world_data <- data
# geo_code_merger <- select(geo_code, country, code) %>% group_by(country) %>% summarise(code=max(code))
# world_data_v2 <- merge(x = world_data , y = geo_code_merger, by.x = c("Country.Region"), by.y = ("country"))
# world_data_v2 <- world_data_v2[order(as.Date(world_data_v2$Date, format="%m.%d.%Y")),]
# df <- read.csv("graph.csv")
#p <- plot_geo(geo_code, locationmode = 'world') %>%
#add_trace( z = geo_code$new_cases_per_million, locations = geo_code$code, frame=geo_code$start_of_week,
#color = geo_code$new_cases_per_million) %>% colorbar(title = "Timeline")
# p
#export as html file
# htmlwidgets::saveWidget(p, file = "map.html")
16.22 % of world`s corona virus are from US
8.64 % of world`s corona virus are from India
6.08 % of world`s corona virus are from Brazil
5.48 % of world`s corona virus are from France
4.82 % of world`s corona virus are from Germany
4.39 % of world`s corona virus are from United Kingdom
3.58 % of world`s corona virus are from Russia
3.36 % of world`s corona virus are from Korea, South
3.20 % of world`s corona virus are from Italy
3.01 % of world`s corona virus are from Turkey
world_data <- data %>% filter(year!=22)
world_data_by_countries <- world_data %>% group_by(Country.Region, month) %>%
summarise(confirmed = max(confirmed_count),
death = max(death_count),
recovered = max(recovered_count), .groups = 'drop') %>% arrange(month)
world_data_by_countries <- world_data_by_countries %>% group_by(month) %>%
summarise(confirmed = sum(confirmed),
death = sum(death),
recovered = sum(recovered)) %>% arrange(month) %>% arrange(as.integer(month))
world_data_by_countries['confirmed_rev_cumsum'] <- c(world_data_by_countries$confirmed[1],diff(world_data_by_countries$confirmed))
world_data_by_countries['death_rev_cumsum'] <- c(world_data_by_countries$death[1],diff(world_data_by_countries$death))
world_data_by_countries['recovered_rev_cumsum'] <- c(world_data_by_countries$recovered[1],diff(world_data_by_countries$recovered))
template1 <- '
<div class="container-fluid bg-warning" style="padding:10px 20px;color: white;background-image: linear-gradient(to left bottom, #a6e90d, #58d056, #00b374, #00937d, #2e7171);">
<h4>Novel COVID 19 Stats Monthly status 2020 & 2021</h4>
<hr style="border-top: 1px solid white;">
<b>According to data of 2020 and 2021, each months observed,</b>
'
template2 <- '
<p>In the month of %s we have observed `%0.2f` %% total confirmed cases and a death rate of `%0.2f`</p>
'
mnths <- c("Jan", "February","March","April","May","June", "July", "Auguest", "September", "October","November","December")
cat(template1)
According to data of 2020 and 2021, each months observed,
for (i in seq(nrow(world_data_by_countries))) {
current <- world_data_by_countries[i, ]
cat(sprintf(template2, mnths[as.integer(current$month)], (current$confirmed_rev_cumsum/sum(world_data_by_countries$confirmed_rev_cumsum))*100,(current$death_rev_cumsum/sum(world_data_by_countries$death_rev_cumsum))*100))
}
In the month of Jan we have observed 35.84 % total confirmed cases and a death rate of 42.13
In the month of February we have observed 3.90 % total confirmed cases and a death rate of 5.72
In the month of March we have observed 5.12 % total confirmed cases and a death rate of 5.52
In the month of April we have observed 7.80 % total confirmed cases and a death rate of 6.98
In the month of May we have observed 6.80 % total confirmed cases and a death rate of 7.00
In the month of June we have observed 3.91 % total confirmed cases and a death rate of 5.23
In the month of July we have observed 5.46 % total confirmed cases and a death rate of 4.98
In the month of Auguest we have observed 6.89 % total confirmed cases and a death rate of 5.53
In the month of September we have observed 5.56 % total confirmed cases and a death rate of 4.85
In the month of October we have observed 4.53 % total confirmed cases and a death rate of 3.98
In the month of November we have observed 5.44 % total confirmed cases and a death rate of 3.98
In the month of December we have observed 8.74 % total confirmed cases and a death rate of 4.10
world_data <- data %>% filter(year==20)
world_data_by_countries <- world_data %>% group_by(Country.Region, month) %>%
summarise(confirmed = max(confirmed_count),
death = max(death_count),
recovered = max(recovered_count), .groups = 'drop') %>% arrange(month)
world_data_by_countries <- world_data_by_countries %>% group_by(month) %>%
summarise(confirmed = sum(confirmed),
death = sum(death),
recovered = sum(recovered)) %>% arrange(month) %>% arrange(as.integer(month))
world_data_by_countries['confirmed_rev_cumsum'] <- c(world_data_by_countries$confirmed[1],diff(world_data_by_countries$confirmed))
world_data_by_countries['death_rev_cumsum'] <- c(world_data_by_countries$death[1],diff(world_data_by_countries$death))
world_data_by_countries['recovered_rev_cumsum'] <- c(world_data_by_countries$recovered[1],diff(world_data_by_countries$recovered))
template1 <- '
<div class="container-fluid bg-warning" style="padding:10px 20px;color: white; background-image: linear-gradient(to left bottom, #051937, #3c405e, #6e6c87, #a29cb3, #d7cfe1);">
<h4>Novel COVID 19 Stats Monthly status 2020</h4>
<hr style="border-top: 1px solid white;">
<b>According to data of 2020, each months observed,</b>
'
template2 <- '
<p>In the month of %s we have observed `%0.2f` %% total confirmed cases and a death rate of `%0.2f`</p>
'
mnths <- c("Jan", "February","March","April","May","June", "July", "Auguest", "September", "October","November","December")
cat(template1)
According to data of 2020, each months observed,
world_data_by_countries_bkp <- world_data_by_countries
for (i in seq(nrow(world_data_by_countries))) {
current <- world_data_by_countries[i, ]
cat(sprintf(template2, mnths[as.integer(current$month)], (current$confirmed_rev_cumsum/sum(world_data_by_countries$confirmed_rev_cumsum))*100, (current$death_rev_cumsum/sum(world_data_by_countries$death_rev_cumsum))*100))
}
In the month of Jan we have observed 0.01 % total confirmed cases and a death rate of 0.01
In the month of February we have observed 0.08 % total confirmed cases and a death rate of 0.14
In the month of March we have observed 0.92 % total confirmed cases and a death rate of 2.22
In the month of April we have observed 2.84 % total confirmed cases and a death rate of 10.32
In the month of May we have observed 3.45 % total confirmed cases and a death rate of 7.96
In the month of June we have observed 5.15 % total confirmed cases and a death rate of 7.67
In the month of July we have observed 8.55 % total confirmed cases and a death rate of 9.44
In the month of Auguest we have observed 9.54 % total confirmed cases and a death rate of 9.83
In the month of September we have observed 10.17 % total confirmed cases and a death rate of 9.00
In the month of October we have observed 14.47 % total confirmed cases and a death rate of 9.83
In the month of November we have observed 20.57 % total confirmed cases and a death rate of 14.71
In the month of December we have observed 24.26 % total confirmed cases and a death rate of 18.85
world_data <- data %>% filter(year==21)
world_data_by_countries <- world_data %>% group_by(Country.Region, month) %>%
summarise(confirmed = max(confirmed_count),
death = max(death_count),
recovered = max(recovered_count), .groups = 'drop') %>% arrange(month)
world_data_by_countries <- world_data_by_countries %>% group_by(month) %>%
summarise(confirmed = sum(confirmed),
death = sum(death),
recovered = sum(recovered)) %>% arrange(month) %>% arrange(as.integer(month))
world_data_by_countries['confirmed'] <- world_data_by_countries$confirmed - sum(world_data_by_countries_bkp$confirmed_rev_cumsum)
world_data_by_countries['confirmed_rev_cumsum'] <- c(world_data_by_countries$confirmed[1],diff(world_data_by_countries$confirmed))
world_data_by_countries['death_rev_cumsum'] <- c(world_data_by_countries$death[1],diff(world_data_by_countries$death))
world_data_by_countries['recovered_rev_cumsum'] <- c(world_data_by_countries$recovered[1],diff(world_data_by_countries$recovered))
template1 <- '
<div class="container-fluid bg-warning" style="padding:10px 20px;color: white; background-image: linear-gradient(to right top, #051937, #004d7a, #008793, #00bf72, #a8eb12);">
<h4>Novel COVID 19 Stats Monthly status 2021</h4>
<hr style="border-top: 1px solid white;">
<b>According to data of 2021, each months observed,</b>
'
template2 <- '
<p>In the month of %s we have observed `%0.2f` %% total confirmed cases and a death rate of `%0.2f`</p>
'
mnths <- c("Jan", "February","March","April","May","June", "July", "Auguest", "September", "October","November","December")
cat(template1)
According to data of 2021, each months observed,
for (i in seq(nrow(world_data_by_countries))) {
current <- world_data_by_countries[i, ]
cat(sprintf(template2, mnths[as.integer(current$month)], (current$confirmed_rev_cumsum/sum(world_data_by_countries$confirmed_rev_cumsum))*100,(current$death_rev_cumsum/sum(world_data_by_countries$death_rev_cumsum))*100))
}
In the month of Jan we have observed 9.54 % total confirmed cases and a death rate of 42.13
In the month of February we have observed 5.50 % total confirmed cases and a death rate of 5.72
In the month of March we have observed 7.22 % total confirmed cases and a death rate of 5.52
In the month of April we have observed 11.00 % total confirmed cases and a death rate of 6.98
In the month of May we have observed 9.59 % total confirmed cases and a death rate of 7.00
In the month of June we have observed 5.52 % total confirmed cases and a death rate of 5.23
In the month of July we have observed 7.70 % total confirmed cases and a death rate of 4.98
In the month of Auguest we have observed 9.71 % total confirmed cases and a death rate of 5.53
In the month of September we have observed 7.83 % total confirmed cases and a death rate of 4.85
In the month of October we have observed 6.39 % total confirmed cases and a death rate of 3.98
In the month of November we have observed 7.67 % total confirmed cases and a death rate of 3.98
In the month of December we have observed 12.32 % total confirmed cases and a death rate of 4.10
world_data <- data
world_data_by_countries <- world_data %>% group_by(Country.Region, month) %>%
summarise(confirmed = max(confirmed_count),
death = max(death_count),
recovered = max(recovered_count), .groups = 'drop') %>% arrange(month)
c1 <- world_data_by_countries %>% filter(month==c(10))
c2 <- world_data_by_countries %>% filter(month==c(12))
c2$confirmed <- c2$confirmed-c1$confirmed
c2$death <- c2$death-c1$death
c2 <- top_n(c2, 10, confirmed) %>% arrange(-confirmed)
c2
template1 <- '
<div class="container-fluid bg-warning" style="padding:10px 20px;color: white; background-image: linear-gradient(to right top, #051937, #004d7a, #008793, #00bf72, #a8eb12);">
<h4>Country with highest confirmed/death rates in last Quarter(Q4)</h4>
<hr style="border-top: 1px solid white;">
<b>According to data in Quarter(Q4),</b>
'
template2 <- '
<p>In %s we have observed `%0.2f` %% total confirmed cases and a death rate of `%0.2f`</p>
'
cat(template1)
According to data in Quarter(Q4),
for (i in seq(nrow(c2))) {
current <- c2[i, ]
cat(sprintf(template2, current$Country.Region, (current$confirmed/sum(c2$confirmed))*100,(current$death/sum(c2$death))*100))
}
In US we have observed 33.68 % total confirmed cases and a death rate of 36.45
In United Kingdom we have observed 14.85 % total confirmed cases and a death rate of 3.62
In France we have observed 10.61 % total confirmed cases and a death rate of 2.68
In Germany we have observed 9.77 % total confirmed cases and a death rate of 7.34
In Russia we have observed 7.43 % total confirmed cases and a death rate of 31.03
In Turkey we have observed 5.55 % total confirmed cases and a death rate of 5.32
In Italy we have observed 5.18 % total confirmed cases and a death rate of 2.40
In Spain we have observed 4.91 % total confirmed cases and a death rate of 0.92
In Poland we have observed 4.15 % total confirmed cases and a death rate of 9.09
In Netherlands we have observed 3.86 % total confirmed cases and a death rate of 1.14
| 2020 | 2021 |
|---|---|
| x | x + y + ………+ Z |
| x+y | x + y + ………+ Z + V |
| …. | …… |
| …. | …… |
Now, x + (x + y + ………+ Z) –(1)
x + y + (x + y + ………+ Z + V) –(2) (2) - (1) y+v, which is the sum of individual vaules of each month.Now, let’s look at the top countries that have been affected, considering their population. We utilized an additional dataset for this investigation, which consisted of the name of the nation and its population. Click here for dataset. Here we joined covid 19 data with population data.
Ratio of total affected people vs population
top_20_countries <- top_20_countries_c
# library(readr)
# csvData <- read_csv("csvData.csv")
# csvData$pop2022 <- csvData$pop2022 *1000
csvData[c(csvData$country=="United States"),]['country'] ='US'
csvData[c(csvData$country=="South Korea"),]['country'] = "Korea, South"
top_20_countries <- merge(x =top_20_countries , y = csvData, by.x = c("Country.Region"), by.y=c("country"))
top_20_countries <- top_20_countries[order(-top_20_countries$confirmed),]
top_20_countries['confirmed_to_pop_ratio'] <- top_20_countries$confirmed/top_20_countries$pop2022
top_20_countries['death_to_pop_ratio'] <- top_20_countries$death/top_20_countries$pop2022
top_20_countries <- top_20_countries[order(-top_20_countries$confirmed_to_pop_ratio),]
top_20_countries[c('Country.Region', 'confirmed', 'death', 'confirmed_to_pop_ratio')]
Ratio of total death people vs population
top_20_countries <- top_20_countries[order(-top_20_countries$death_to_pop_ratio),]
top_20_countries[c('Country.Region', 'confirmed', 'death', 'death_to_pop_ratio')]
Now lets select few top countries and analyse the data deeper. At first Let’s consider united states, the country that has shown very high number of confirmed cases
library(ggplot2)
data_by_country <- data
data_by_country$Date <- data_by_country$Date %>% as.Date("%m.%d.%y")
country <- data_by_country %>% group_by(Country.Region) %>% mutate(cumconfirmed=cumsum(confirmed_count), days = Date - first(Date) + 1)
US <- country %>% filter(Country.Region=="US")
country
ggplot(US, aes(x=days, y=confirmed_count)) + geom_line(color="red") +
theme_classic() +
labs(title = "Covid-19 United States Confirmed Cases", x= "Days", y= "Daily confirmed cases") +
theme(plot.title = element_text(hjust = 0.5))
## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
ggplot(US, aes(x=days, y=death_count)) + geom_line(color="red") +
theme_classic() +
labs(title = "Covid-19 United States Death Cases", x= "Days", y= "Daily confirmed cases") +
theme(plot.title = element_text(hjust = 0.5))
## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
ggplot(US, aes(x=days, y=recovered_count)) + geom_line(color="red") +
theme_classic() +
labs(title = "Covid-19 United States Recovered Cases", x= "Days", y= "Daily confirmed cases") +
theme(plot.title = element_text(hjust = 0.5))
## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
drop <- c("Province.State")
country = country[,!(names(country) %in% drop)]
# Some inconsistancy with UK data hence ignoring
country <- country %>% filter(!Country.Region=="United Kingdom")
country <- country %>% filter(Country.Region==c(top_20_countries$Country.Region))
world_perspective <- ggplot(country, aes(x=days, y=confirmed_count, group=Country.Region, color=Country.Region)) + geom_line() + labs(title = "Covid-19 Confirmed Cases in world perspective", x= "Days", y= "Daily confirmed cases") +
theme(plot.title = element_text(hjust = 0.5))
world_perspective
## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
Here, we don’t have any data regarding how these covid affected in different provinces of united states. For many countries these data is not even present. But for some countries including US.
us_data_confirmed <- read.csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv')
us_data_confirmed <- us_data_confirmed %>% pivot_longer(cols = starts_with("X"), names_to = "Date", values_to = "confirmed_count")
us_data_confirmed$Date <- substr(us_data_confirmed$Date,2,20)
us_data_confirmed <- us_data_confirmed %>% group_by(Province_State) %>% summarise(confirmed=max(confirmed_count), Lat=median(Lat), Long_=median(Long_))
lng<-mean(us_data_confirmed$Long_)
lat<-mean(us_data_confirmed$Lat)
pal = colorNumeric(
palette = "viridis",
domain = us_data_confirmed$`confirmed`
)
leaflet(us_data_confirmed) %>% addTiles() %>%
addCircleMarkers(lng = ~Long_, lat = ~Lat,
label = ~Province_State,
color=~pal(us_data_confirmed$confirmed),
radius= ~confirmed*0.000015)%>%
addLegend( "bottomright", pal = pal, values = ~confirmed,
title = "Total Affected",
labFormat = labelFormat(prefix = " "),
opacity = 0.75)%>%
setView(lat= 35, lng=-100,zoom=4)
data_by_country <- data
data_by_country$Date <- data_by_country$Date %>% as.Date("%m.%d.%y")
country <- data_by_country %>% group_by(Country.Region) %>% mutate(cumconfirmed=cumsum(confirmed_count), days = Date - first(Date) + 1)
Australia <- country %>% filter(Country.Region=="Australia") %>% group_by(Date) %>% mutate(confirmed_count=sum(confirmed_count),
death_count=sum(death_count),
recovered_count=sum(recovered_count))
ggplot(Australia, aes(x=days, y=confirmed_count)) + geom_line(color="red") +
theme_classic() +
labs(title = "Covid-19 Australia Confirmed Cases", x= "Days", y= "Daily confirmed cases") +
theme(plot.title = element_text(hjust = 0.5))
## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
ggplot(Australia, aes(x=days, y=death_count)) + geom_line(color="red") +
theme_classic() +
labs(title = "Covid-19 Australia Death Cases", x= "Days", y= "Daily confirmed cases") +
theme(plot.title = element_text(hjust = 0.5))
## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
ggplot(Australia, aes(x=days, y=recovered_count)) + geom_line(color="red") +
theme_classic() +
labs(title = "Covid-19 Australia Recovered Cases", x= "Days", y= "Daily confirmed cases") +
theme(plot.title = element_text(hjust = 0.5))
## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
# Some inconsistancy with UK data hence ignoring
country <- country %>% filter(!Country.Region=="United Kingdom")
country <- country %>% filter(Country.Region==c(top_20_countries$Country.Region))
world_perspective <- ggplot(country, aes(x=days, y=confirmed_count, group=Country.Region, color=Country.Region)) + geom_line() + theme_classic() +
labs(title = "Covid-19 Confirmed Cases in world perspective", x= "Days", y= "Daily confirmed cases") +
theme(plot.title = element_text(hjust = 0.5))
world_perspective
## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
Here, is the plot for how covid affected in different provinces of Australia.
data_by_country <- data
data_by_country$Date <- data_by_country$Date %>% as.Date("%m.%d.%y")
country <- data_by_country %>% group_by(Country.Region) %>% mutate(cumconfirmed=cumsum(confirmed_count), days = Date - first(Date) + 1)
us_data_confirmed <- country %>% filter(Country.Region=="Australia")
us_data_confirmed <- us_data_confirmed %>% group_by(Province.State) %>% summarise(confirmed=max(confirmed_count), Lat=median(Lat), Long_=median(Long))
lng<-mean(us_data_confirmed$Long_)
lat<-mean(us_data_confirmed$Lat)
pal = colorNumeric(
palette = "viridis",
domain = us_data_confirmed$`confirmed`
)
leaflet(us_data_confirmed) %>% addTiles() %>%
addCircleMarkers(lng = ~Long_, lat = ~Lat,
label = ~Province.State,
color=~pal(us_data_confirmed$confirmed),
radius= ~confirmed*0.000025)%>%
addLegend( "bottomright", pal = pal, values = ~confirmed,
title = "Total Affected",
labFormat = labelFormat(prefix = " "),
opacity = 0.75)%>%
setView(lat= -30, lng=140,zoom=4)
data_by_country <- data
data_by_country$Date <- data_by_country$Date %>% as.Date("%m.%d.%y")
country <- data_by_country %>% group_by(Country.Region) %>% mutate(cumconfirmed=cumsum(confirmed_count), days = Date - first(Date) + 1)
country <- country %>% filter(Country.Region==c(top_20_countries$Country.Region))
world_perspective <- ggplot(country, aes(x=days, y=confirmed_count, group=Country.Region, color=Country.Region)) + geom_line() +
theme_classic() +
labs(title = "Covid-19 Confirmed Cases in world perspective", x= "Days", y= "Daily confirmed cases") +
theme(plot.title = element_text(hjust = 0.5)) + facet_wrap(~Country.Region)
world_perspective
## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
world_perspective <- ggplot(country, aes(x=days, y=death_count, group=Country.Region, color=Country.Region)) + geom_line() +
theme_classic() +
labs(title = "Covid-19 Death Cases in world perspective", x= "Days", y= "Daily confirmed cases") +
theme(plot.title = element_text(hjust = 0.5)) + facet_wrap(~Country.Region)
world_perspective
## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
world_perspective <- ggplot(country, aes(x=days, y=recovered_count, group=Country.Region, color=Country.Region)) + geom_line() +
theme_classic() +
labs(title = "Covid-19 recovery Cases in world perspective", x= "Days", y= "Daily confirmed cases") +
theme(plot.title = element_text(hjust = 0.5)) + facet_wrap(~Country.Region)
world_perspective
## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
In this section we are going to analyze situation in India. Since the data required for this particular analysis not present in the CSSEGISandData/COVID-19 repo we are using another dataset for this purpose.
str(covid)
## 'data.frame': 18110 obs. of 9 variables:
## $ Sno : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Date : chr "2020-01-30" "2020-01-31" "2020-02-01" "2020-02-02" ...
## $ Time : chr "6:00 PM" "6:00 PM" "6:00 PM" "6:00 PM" ...
## $ State.UnionTerritory : chr "Kerala" "Kerala" "Kerala" "Kerala" ...
## $ ConfirmedIndianNational : chr "1" "1" "2" "3" ...
## $ ConfirmedForeignNational: chr "0" "0" "0" "0" ...
## $ Cured : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Deaths : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Confirmed : int 1 1 2 3 3 3 3 3 3 3 ...
str(testing)
## 'data.frame': 16336 obs. of 5 variables:
## $ Date : chr "2020-04-17" "2020-04-24" "2020-04-27" "2020-05-01" ...
## $ State : chr "Andaman and Nicobar Islands" "Andaman and Nicobar Islands" "Andaman and Nicobar Islands" "Andaman and Nicobar Islands" ...
## $ TotalSamples: num 1403 2679 2848 3754 6677 ...
## $ Negative : int 1210 NA NA NA NA NA NA NA NA NA ...
## $ Positive : num 12 27 33 33 33 33 33 33 33 33 ...
str(vaccine)
## 'data.frame': 7644 obs. of 24 variables:
## $ Updated.On : chr "16/01/2021" "17/01/2021" "18/01/2021" "19/01/2021" ...
## $ State : chr "India" "India" "India" "India" ...
## $ Total.Doses.Administered : num 48276 58604 99449 195525 251280 ...
## $ Sessions : num 3455 8532 13611 17855 25472 ...
## $ Sites : num 2957 4954 6583 7951 10504 ...
## $ First.Dose.Administered : num 48276 58604 99449 195525 251280 ...
## $ Second.Dose.Administered : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Male..Doses.Administered. : num NA NA NA NA NA NA NA NA NA NA ...
## $ Female..Doses.Administered. : num NA NA NA NA NA NA NA NA NA NA ...
## $ Transgender..Doses.Administered. : num NA NA NA NA NA NA NA NA NA NA ...
## $ Covaxin..Doses.Administered. : num 579 635 1299 3017 3946 ...
## $ CoviShield..Doses.Administered. : num 47697 57969 98150 192508 247334 ...
## $ Sputnik.V..Doses.Administered. : num NA NA NA NA NA NA NA NA NA NA ...
## $ AEFI : num NA NA NA NA NA NA NA NA NA NA ...
## $ X18.44.Years..Doses.Administered. : num NA NA NA NA NA NA NA NA NA NA ...
## $ X45.60.Years..Doses.Administered. : num NA NA NA NA NA NA NA NA NA NA ...
## $ X60..Years..Doses.Administered. : num NA NA NA NA NA NA NA NA NA NA ...
## $ X18.44.Years.Individuals.Vaccinated.: num NA NA NA NA NA NA NA NA NA NA ...
## $ X45.60.Years.Individuals.Vaccinated.: num NA NA NA NA NA NA NA NA NA NA ...
## $ X60..Years.Individuals.Vaccinated. : num NA NA NA NA NA NA NA NA NA NA ...
## $ Male.Individuals.Vaccinated. : num 23757 27348 41361 81901 98111 ...
## $ Female.Individuals.Vaccinated. : num 24517 31252 58083 113613 153145 ...
## $ Transgender.Individuals.Vaccinated. : num 2 4 5 11 24 38 80 103 128 201 ...
## $ Total.Individuals.Vaccinated : num 48276 58604 99449 195525 251280 ...
Here, In this section we are planning to analyze indian data in different aspects.
full_covid_data <- inner_join(covid,testing, by=c("Date"="Date","State.UnionTerritory"="State"))
full_covid_data[is.na(full_covid_data)] <- 0
top_affected <- full_covid_data %>% group_by(State.UnionTerritory) %>% summarise(Cured=max(Cured), Deaths=max(Deaths), Confirmed=max(Confirmed)) %>%
select(State.UnionTerritory,Cured,Deaths,Confirmed) %>%
arrange(desc(Confirmed)) %>% top_n(10)
## Selecting by Confirmed
ta <- as.vector(top_affected[['State.UnionTerritory']])
full_covid_data$Date <- as.Date(full_covid_data$Date)
full_covid_data %>%
filter(State.UnionTerritory %in% ta) %>%
ggplot(aes(x=Date,y=Confirmed)) + geom_line(aes(color=State.UnionTerritory),size=1.2)+
scale_x_date(limit=c(as.Date("2020-04-01"),as.Date("2021-08-11"))) +
theme_classic() +
scale_y_continuous(labels=scales :: number_format(accuracy=1))+
labs(title='Time Series for Confirmed Cases',subtitle = 'Top affected states')+
xlab(label='Time Period') +
ylab(label='Confirmed Cases') +
scale_fill_viridis_d()
full_covid_data$Active = (full_covid_data$Confirmed-(full_covid_data$Deaths + full_covid_data$Cured))
full_covid_data %>%
filter(State.UnionTerritory %in% ta) %>%
ggplot(aes(x=Date,y=Active)) + geom_line(aes(color=State.UnionTerritory),size=1.2)+
scale_x_date(limit=c(as.Date("2020-04-01"),as.Date("2021-05-07"))) +
scale_y_continuous(labels=scales :: number_format(accuracy=1))+
labs(title='Time Series for Active Cases',subtitle = 'Top 10 worst affected states')+
theme_classic() +
xlab(label='Time Period') +
ylab(label='Active Cases') +
scale_fill_viridis_d()
full_covid_data %>%
filter(Date==max(Date)) %>%
ggplot(aes(x=Confirmed,y=State.UnionTerritory))+geom_col(fill='red',alpha=0.8)+
scale_x_continuous(labels=scales :: number_format(accuracy=1))+
theme_minimal() +
labs(title="Total Confirmed cases grouped by states")
full_covid_data %>%
filter(Date==max(Date)) %>%
ggplot(aes(x=Active,y=State.UnionTerritory))+geom_col(fill='green',alpha=0.8)+
scale_x_continuous(labels=scales :: number_format(accuracy=1))+
theme_light() +
labs(title="Total Confirmed cases grouped by states")
full_covid_data %>%
filter(Date==max(Date)) %>%
ggplot(aes(x=Deaths,y=State.UnionTerritory))+geom_col()+
scale_fill_viridis_d() +
scale_x_continuous(labels=scales :: number_format(accuracy=1))+
theme_light() +
labs(title="Total Confirmed cases grouped by states")
Here is the deatailed plot for the growth of covid in India.
india<-full_covid_data %>%
group_by(Date) %>%
summarise(Cured_tot=sum(Cured),
Deaths_tot=sum(Deaths),
Confirmed_tot=sum(Confirmed),
Active_tot=sum(Active))
plot_india <- india %>%
ggplot(aes(x=Date,y=Confirmed_tot)) + geom_line(color='blue',size=1) +
labs(title="Times series for Confirmed Cases")+
theme_linedraw() +
xlab(label ="Time Period") +
ylab(label="Confirmed Cases") +
scale_y_continuous(labels = scales :: number_format(accuracy=1))
plot_india
library(lubridate)
# The transmute method in dplyr allows you to add new variables, especially computed ones. Unlike mutate, the transmute will #remove other columns by default. A common data wrangling task is to create new columns using computations on existing columns.
tbl_covid_19_india <- covid
colnames(tbl_covid_19_india) <- sub("/", "", colnames(tbl_covid_19_india), fixed = TRUE)
tbl_covid_19_india <- tbl_covid_19_india %>% mutate(new_date = ymd(Date)) %>%
transmute(
Sno = Sno,
Date = new_date,
StateUnionTerritory = State.UnionTerritory,
ConfirmedIndianNational = ConfirmedIndianNational,
ConfirmedForeignNational = ConfirmedForeignNational,
Cured = Cured,
Deaths = Deaths,
Confirmed = Confirmed
)
# tbl_covid_19_india
tbl_deaths_percentage_1 <- inner_join(tbl_covid_19_india,
tbl_covid_19_india %>% group_by(StateUnionTerritory) %>%
summarise(max_date = max(Date)) %>% ungroup() %>%
transmute(StateUnionTerritory = StateUnionTerritory,
Date = max_date), by = c("StateUnionTerritory", "Date"))
# tbl_deaths_percentage_1
tbl_deaths_percentage <- mutate(tbl_deaths_percentage_1,
new_StateUnionTerritory = str_replace(StateUnionTerritory, "#", ""),
new_StateUnionTerritory1 = str_replace(new_StateUnionTerritory, "Andaman and Nicobar Islands", "Andaman & Nicobar")) %>%
transmute(state = new_StateUnionTerritory1,
Date = Date,
Cured = Cured,
Deaths = Deaths,
Confirmed = Confirmed)
# tbl_deaths_percentage
# COVID 19 India - Case Fatality Rate - % of Deaths/Confirmed Cases
p_death <- tbl_deaths_percentage %>% group_by(state) %>%
summarise(sum_cured = sum(Cured),
sum_deaths = sum(Deaths),
sum_confirmed = sum(Confirmed),
deaths_perc = round(sum(Deaths)/sum(Confirmed)*100, digits = 2)) %>%
filter(deaths_perc != 0) %>%
ggplot(mapping = aes(x = reorder(state, deaths_perc), y = deaths_perc)) +
geom_bar(mapping = aes(fill = state), stat = "identity", show.legend = FALSE) +
coord_flip() +
xlab("States/Union Territories") +
ylab("% of Deaths/Confirmed") +
ggtitle("Case Fatality Rate - % of Deaths/Confirmed Cases") +
scale_fill_viridis_d() + theme_minimal()
p_death
# COVID 19 India - % of Cured/Confirmed Cases
p_cured <- tbl_deaths_percentage %>% group_by(state) %>%
summarise(sum_cured = sum(Cured),
sum_deaths = sum(Deaths),
sum_confirmed = sum(Confirmed),
cured_perc = round(sum(Cured)/sum(Confirmed)*100, digits = 2)) %>%
filter(cured_perc != 0) %>% mutate(rown = row_number(desc(cured_perc))) %>% filter(rown <= 25) %>%
ggplot(mapping = aes(x = reorder(state, cured_perc), y = cured_perc)) +
geom_bar(mapping = aes(fill = state), stat = "identity", show.legend = FALSE) +
coord_flip() +
xlab("States/Union Territories") +
ylab("% of Cured/Confirmed") +
ggtitle("Case Cured Rate - % of Cured/Confirmed Cases") +
scale_fill_viridis_d() + theme_minimal()
p_cured
tbl_state_testing_details <- testing
tbl_state_testing_details <- transmute(tbl_state_testing_details,
Date = Date,
State = State,
TotalSamples = replace_na(TotalSamples, 0),
Negative = replace_na(Negative, 0),
Positive = replace_na(Positive, 0)
)
p_testing_details <- tbl_state_testing_details %>% filter(TotalSamples != 0) %>% group_by(State) %>%
filter(Date == max(Date)) %>%
ungroup() %>% transmute(
Date = Date,
State = State,
Negative = ifelse(Negative == 0, TotalSamples - Positive, Negative),
Positive = ifelse(Positive == 0, TotalSamples - Negative, Positive),
TotalSamples = Negative + Positive
) %>%
pivot_longer(c(Negative, Positive), names_to = "type", values_to = "Samples") %>%
ggplot(mapping = aes(x = reorder(State, desc(TotalSamples)), y = Samples)) +
geom_col(mapping = aes(fill = type), position = position_stack(reverse = TRUE), show.legend = TRUE) +
scale_y_continuous(labels = function(x) format(x, scientific = FALSE)) +
coord_flip() +
ylab("Total Samples Tested") +
xlab("State") +
ggtitle("Testing Volumes by States") +
scale_fill_manual(values = c("orange", "red"))
p_testing_details
p_ratio_positive_tests <- tbl_state_testing_details %>% filter(TotalSamples != 0, Positive != 0) %>% group_by(State) %>%
filter(Date == max(Date)) %>%
ungroup() %>%
mutate(Positive_test_ratio = round(Positive/TotalSamples, digits = 2),
rown = row_number(desc(Positive_test_ratio))) %>%
filter(rown <= 20) %>%
ggplot(mapping = aes(x = reorder(State, desc(-Positive_test_ratio)), y = Positive_test_ratio)) +
geom_bar(mapping = aes(fill = State), stat = "identity", show.legend = FALSE) +
coord_flip() +
ylab("Ratio of Positive Samples Tested") +
xlab("State") +
ggtitle("Test positivity by State") +
scale_fill_viridis_d()
p_ratio_positive_tests
vaccine_na <- subset(vaccine, !is.na(Total.Doses.Administered))
vaccine_na$Updated.On <- as.Date(vaccine_na$Updated.On,format="%d/%m/%y")
vaccine_na <- vaccine_na %>% filter(State != 'India')
top_vaccine <- vaccine_na %>%
filter(Updated.On ==max(Updated.On )) %>%
select(State,Total.Doses.Administered) %>%
arrange(desc(Total.Doses.Administered)) %>%
top_n(5)
## Selecting by Total.Doses.Administered
tv <- top_vaccine[['State']]
tv[6] <- 'Kerala'
vaccine_na <- rename(vaccine_na,Date = Updated.On)
vaccine_na %>%
filter(State %in% tv) %>%
ggplot(aes(x=Date,y=Total.Doses.Administered)) + geom_line(aes(color=State))+
labs(title="Time Series for Doses Administered")
Now Let’s looking into age-wise and gender wise distribution of vaccination accross the country.
Now let’s analyze the overall distribution of vaccine
vaccination_data <- vaccine_na %>%
group_by(Date) %>%
summarise(Date,tot = sum(Total.Doses.Administered),
tot_cv=sum(Covaxin..Doses.Administered.),
tot_cs=sum(CoviShield..Doses.Administered.),
tot_m=sum(Male..Doses.Administered.),
tot_f=sum(Female..Doses.Administered.),
tot_t=sum(Transgender..Doses.Administered.),
tot_i=sum(Total.Individuals.Vaccinated)) %>%
summarise(Total_dose=mean(tot),
Total_covaxi = mean(tot_cv),
Total_covis =mean(tot_cs),
Total_Male = mean(tot_m),
Total_Female = mean(tot_f),
Total_Transgender = mean(tot_t),
Total_vaccinated = mean(tot_i))
## `summarise()` has grouped output by 'Date'. You can override using the `.groups`
## argument.
vaccination_data %>%
ggplot(aes(x=Date)) + geom_area(aes(y=Total_dose,color='green'),fill='green',alpha=.3) +
geom_area(aes(y=Total_Male,color='blue'),fill='blue',alpha=.3) +
geom_area(aes(y=Total_Female,color='red'),fill='red',alpha=.3) +
geom_area(aes(y=Total_Transgender,color='yellow'),fill='black',alpha=1) +
labs(title="Time series for Vaccinated") +
xlab(label ="Time Period") +
ylab(label="Total Vaccinated") +
scale_y_continuous(labels = scales :: number_format(accuracy=1))+
theme(legend.position="right")+
scale_color_identity(name = "Legend",
breaks = c("green", "blue", "red","yellow"),
labels = c("Total Vaccinated", "Men", "Women","Transgender"),
guide = "legend")
vaccine_bar <- vaccine_na %>%
filter(Date =="2020-03-16") %>%
select(State, X18.44.Years.Individuals.Vaccinated., X45.60.Years.Individuals.Vaccinated., X60..Years.Individuals.Vaccinated.)
vaccine_bar <- vaccine_bar %>% pivot_longer(cols = starts_with("X"), names_to = "Age Group", values_to = "value")
vaccine_bar <- subset(vaccine_bar, !is.na(value))
vaccine_bar %>%
filter(State %in% tv) %>%
ggplot(aes(x=State,y=value,fill=`Age Group`))+geom_bar(stat='identity',position = 'fill') +
scale_fill_discrete(name='Age Group',
breaks=c('X18.44.Years.Individuals.Vaccinated.', 'X45.60.Years.Individuals.Vaccinated.','X60..Years.Individuals.Vaccinated.'),
labels=c('18 to 44','44 to 60','>60')) +
ylab('Percentage') +
theme_classic()+
labs(title='Age group distribution for')
|